Skip to content

Add sensor driver extraction toolkit and trace --output= flag#145

Merged
widgetii merged 1 commit intomasterfrom
feat/sensor-driver-extraction
May 3, 2026
Merged

Add sensor driver extraction toolkit and trace --output= flag#145
widgetii merged 1 commit intomasterfrom
feat/sensor-driver-extraction

Conversation

@widgetii
Copy link
Copy Markdown
Member

@widgetii widgetii commented May 3, 2026

Summary

  • Add --output=PATH to ipctool trace: parent freopens stdout after fork so streamer logs (Majestic, Sofia) no longer interleave with trace pseudocode and corrupt register-write lines on shared fd 1
  • Add a post-processing pipeline under tools/ (pure Python stdlib) that turns a captured trace into a buildable HiSilicon-shaped sensor driver scaffold and diffs it against a known reference
  • Add tools/capture_sensor.sh wrapping the camera-side flow for both OpenIPC/Majestic (ssh) and XiongMai/Sofia (telnet + bind-mount over /usr/bin/Sofia)
  • Add docs/sensor-driver-extraction.md — first researcher-oriented doc in the project, covering the full workflow plus the non-obvious gotchas

Why

The trace command has been around for a while but the README has it as a single line: ipctool trace /usr/bin/Sofia. In practice, getting a usable sensor init out of a running streamer requires:

  • a UPX-packed binary (XiongMai HiLinux 4.9 kernels reject raw musl-static ELFs with ENOEXEC, busybox sh then tries to interpret the ELF as a script and dies with syntax error: unexpected word (expecting ")"))
  • redirecting trace output away from the child's stdout (otherwise streamer logs interleave and truncate trace lines mid-write)
  • understanding that on XiongMai firmwares, XmServices_Mgr is not a supervisor — it forks SofiaRun.sh once at boot and doesn't restart it; the watchdog is fed by XmServices_Mgr, not by Sofia
  • redirecting stdin to /dev/null for a backgrounded ptraced chain to avoid SIGTTIN

This PR captures all of that as code and prose so the next researcher doesn't have to rediscover it.

Validation

End-to-end on SmartSens SC2315E + Hisilicon HI3516EV200, using widgetii/smart_sc2315e as ground truth:

capture source address value LCS sequence
OpenIPC Majestic (libsns_sc2315e.so) 100% 100% 100%
XiongMai Sofia (statically linked) 100% 100% 100%
Majestic vs Sofia (cross-check) 100% 100% 100%

172 register writes match exactly across all three. Both streamers ultimately load the same vendor sc2315e_cmos.c driver, and ipctool trace recovers it byte-for-byte.

File map

src/main.c                                + usage string for --output=
src/ptrace.c                              + --output=PATH flag
docs/sensor-driver-extraction.md          + full researcher manual
tools/capture_sensor.sh                   + build / majestic / sofia subcommands
tools/trace_segment.py                    + pseudocode -> phase JSON
tools/trace_to_driver.py                  + JSON -> C scaffold
tools/trace_diff.py                       + scoped diff vs reference
README.md                                 ~ link to new docs
.gitignore                                ~ exclude tools/dumps and build-arm*

Test plan

  • tools/capture_sensor.sh build produces a UPX-packed binary
  • tools/capture_sensor.sh majestic --host <openipc-cam> round-trips capture + restart
  • tools/capture_sensor.sh sofia --host <xm-cam> produces a clean trace and reboots cleanly
  • python3 tools/trace_segment.py <log> produces non-empty init phase
  • python3 tools/trace_diff.py <gen.c> <ref.c> --gen-scope ... --ref-scope ... reports ≥90% address match for a sensor with a known reference
  • Existing ipctool trace invocations (without --output=) behave unchanged

🤖 Generated with Claude Code

ipctool trace already decodes sensor I2C/SPI/MIPI/VI traffic into
C-pseudocode, but the output goes to the same stdout the traced child
uses, so verbose streamers (Majestic, Sofia) interleave their own log
lines with ours and occasionally truncate mid-token.

* `--output=PATH` flag: parent freopens stdout after fork; the child
  keeps its inherited fd 1 untouched so its logs go where they always
  did. Default behaviour unchanged.

* New post-processing pipeline under tools/ (no external deps):
  trace_segment.py splits a captured trace into pre_sensor/init/
  post_init/runtime phases via reset/stream-on detection plus
  write-frequency heuristics; trace_to_driver.py emits a
  HiSilicon-shaped C scaffold (sensor_linear_init +
  post_init_exposure_prime); trace_diff.py compares against a known
  reference with --gen-scope/--ref-scope so AE-overwritten regs don't
  pollute the diff.

* tools/capture_sensor.sh wraps the camera-side flow for both
  OpenIPC/Majestic (ssh) and XiongMai/Sofia (telnet + bind-mount over
  /usr/bin/Sofia, since XmServices_Mgr is not a real supervisor).
  The build subcommand downloads the canonical OpenIPC toolchain and
  UPX-packs the binary, which is required for the XiongMai HiLinux
  kernel to load it (raw musl-static ELFs hit ENOEXEC there).

* docs/sensor-driver-extraction.md documents the full workflow,
  including the non-obvious gotchas (UPX requirement, XmServices_Mgr
  behaviour, SIGTTIN on backgrounded ptraced chains, fd-1 sharing).
  Validated end-to-end on SC2315E + HI3516EV200: 100% address / 100%
  value / 100% LCS sequence match against widgetii/smart_sc2315e
  reference, identical from both Majestic and Sofia captures.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@widgetii widgetii merged commit b5f1b8d into master May 3, 2026
2 checks passed
@widgetii widgetii deleted the feat/sensor-driver-extraction branch May 3, 2026 08:30
widgetii added a commit that referenced this pull request May 3, 2026
…146)

trace_to_driver.py output now passes `gcc -Wall -Wextra -fsyntax-only`
without any vendor headers. Replaces the previous stub `extern void
<sensor>_write_register(...)` (which was never called) with a small
"SDK stubs" block: a `typedef int VI_PIPE` and a no-op
`sensor_write_register(addr, val)`. Comment block instructs how to
swap the stubs out for hi_comm_video.h / hi_sns_ctrl.h plus the
vendor's bus-aware implementation when integrating into a HiSilicon
SDK build. `(void)ViPipe` cast inside each function silences the
unused-parameter warning that the vendor's macro form would resolve.

tools/test_pipeline.sh runs the full segment -> generate -> compile
flow on a small synthetic trace and exits non-zero on any failure.
Wired into pr-build-check.yml as test-extraction-pipeline, so a
regression in any of the Python scripts that breaks the generator
output is caught at PR time without hardware.

Default-mode `ipctool trace` (no --output=) regression-checked on a
real Majestic camera: identical output structure to the pre-#145
behaviour, with the well-understood fd-1 corruption pattern that
motivated --output= in the first place. Confirms #145 is opt-in and
the default codepath is unchanged.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
widgetii added a commit that referenced this pull request May 3, 2026
Two changes that share a theme: making the AE/AGC layer of an
extracted sensor driver navigable. The first (write-side) finishes a
deferred item from #145; the second (read-side) documents an
existing-but-undocumented ipctool feature.

Write-side: trace_to_driver.py now emits a third function
`<sensor>_ae_step` that writes each runtime hot register (top 8,
above 25% of the max count) with its last-seen trace value, in
trace order, each tagged `/* TODO: derive */`. Header documents that
the values are placeholders and points at the vendor's
cmos_inttime_update / cmos_gains_update equivalents for the math.

Cross-checked against widgetii/smart_sc2315e on the SC2315E + Majestic
capture from #145: skeleton emits exactly 0x3314, 0x5781, 0x5785 with
values 0x14, 0x60, 0x30 - matching the else-branches of the reference's
two AE callbacks. The trace was captured under steady ambient light,
so only the low-gain / short-inttime branch values appear.

Read-side: `ipctool sensor monitor` already supported SC2315E (added
some time ago); the feature was never documented anywhere. It reads
the same hot register set in a loop while the sensor runs, giving you
value time-series under varying lighting - the natural complement to
the static trace-extraction. New "Stage 4 - Live-reading the AE state"
section in docs/sensor-driver-extraction.md walks through pairing the
two: extract `_ae_step` for the register set, then `monitor` while
varying lighting to capture the value distribution that derives the
threshold conditionals. Renamed sc2315e's R3812 to HOLD (it is the
group-hold trigger from cmos_gains_update) so the live-read output is
self-explanatory.

test_pipeline.sh asserts the new function is emitted and the file
still passes gcc -Wall -Wextra. Docs updated to describe the third
function and the inherent limitation (math not in trace).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
widgetii added a commit that referenced this pull request May 3, 2026
…cs (#147)

Two changes that share a theme: making the AE/AGC layer of an
extracted sensor driver navigable. The first (write-side) finishes a
deferred item from #145; the second (read-side) documents an
existing-but-undocumented ipctool feature.

Write-side: trace_to_driver.py now emits a third function
`<sensor>_ae_step` that writes each runtime hot register (top 8,
above 25% of the max count) with its last-seen trace value, in
trace order, each tagged `/* TODO: derive */`. Header documents that
the values are placeholders and points at the vendor's
cmos_inttime_update / cmos_gains_update equivalents for the math.

Cross-checked against widgetii/smart_sc2315e on the SC2315E + Majestic
capture from #145: skeleton emits exactly 0x3314, 0x5781, 0x5785 with
values 0x14, 0x60, 0x30 - matching the else-branches of the reference's
two AE callbacks. The trace was captured under steady ambient light,
so only the low-gain / short-inttime branch values appear.

Read-side: `ipctool sensor monitor` already supported SC2315E (added
some time ago); the feature was never documented anywhere. It reads
the same hot register set in a loop while the sensor runs, giving you
value time-series under varying lighting - the natural complement to
the static trace-extraction. New "Stage 4 - Live-reading the AE state"
section in docs/sensor-driver-extraction.md walks through pairing the
two: extract `_ae_step` for the register set, then `monitor` while
varying lighting to capture the value distribution that derives the
threshold conditionals. Renamed sc2315e's R3812 to HOLD (it is the
group-hold trigger from cmos_gains_update) so the live-read output is
self-explanatory.

test_pipeline.sh asserts the new function is emitted and the file
still passes gcc -Wall -Wextra. Docs updated to describe the third
function and the inherent limitation (math not in trace).

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
widgetii added a commit that referenced this pull request May 3, 2026
Closes the third "real gap" from #145's plan-vs-shipped audit:
MIPI/VI structures captured by ipctool trace are now emitted as
proper C struct initializers in the generated scaffold, not just
embedded as multi-line comments inside the init function.

Two pieces:

trace_segment.py:
  Existing `RE_STRUCT_OPEN` only matched single-identifier openers
  (`name = {`), so the actual two-word ipctool dumps (`type name = {`,
  e.g. `combo_dev_attr_t SENSOR_ATTR = {`) fell through to plain text
  and the orphan `};` at the end leaked into the next phase. Tightened
  the regex to require `\w+\s+\w+\s*=\s*\{$`. Side effect: init phase
  on the SC2315E + Majestic regression capture goes 172 -> 173 events
  (the spurious leaked `};` no longer counts toward init), value/diff
  unchanged at 100/100/100% vs widgetii/smart_sc2315e.

trace_to_driver.py:
  New `collect_structs()` walks all phases and dedupes by variable name
  (last-seen value wins). `emit_structs_block()` writes them at file
  scope between the SDK stubs and the init function, wrapped in
  `#if 0 / #endif` so the scaffold continues to pass
  `gcc -Wall -Wextra -fsyntax-only` standalone (the struct bodies
  reference vendor-only enums like INPUT_MODE_MIPI). User removes the
  guard when integrating into a HiSilicon SDK build, where
  hi_comm_video.h / hi_mipi.h provide the types and enums.

Note documented in the comment block: ipctool's V4 VI-dev dumper at
src/hal/hisi/ptrace.c:771 emits the variable name in
`pstViDevAttr VI_DEV_ATTR_S` order (reversed from SDK convention
where VI_DEV_ATTR_S is the type). Not fixed here - changing the dump
format would break existing parsers; the user can rename when
integrating.

test_pipeline.sh extended to include a struct event in the synthetic
trace and assert the file-scope declaration plus #if 0 wrapper are
emitted. Docs section updated.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
widgetii added a commit that referenced this pull request May 3, 2026
)

Closes the fourth real gap from #145's plan-vs-shipped audit, with one
caveat documented in the docs and tested empirically below.

trace_segment.py: new find_mode_switches() walks the trace after
init_end watching for `0x100=0 ... 0x100=1` cycles. Each cycle becomes
a mode_switch_N phase. The runtime/post_init anchors move to the last
0x100=1, so traces without a mode switch are unchanged.

trace_to_driver.py: emits one `<sensor>_set_mode_N` function per
mode-switch phase, same shape as `_linear_init`.

test_pipeline.sh: synthetic fixture extended with a mode switch
(0x100=0 / VMAX-rewrite / 0x100=1). CI asserts `_set_mode_1` is
emitted in the generated C and that it still passes
gcc -Wall -Wextra and the self-diff.

Empirical caveat: capture-side, mode switches require a streamer
that supports runtime sensor reconfiguration. Majestic does not
(config goes through .ini files + restart, each mode is a separate
cold-init capture). Sofia does, via the DVR-IP protocol; python-dvr
exposes the knobs. But whether a given knob causes a sensor-side
reconfigure is sensor-specific - Sofia's BroadTrends path lands in
software-side gain on most sensors. On SC2315E specifically,
toggling AutoGain 0->1->0 while ipctool trace was watching produced
*zero* additional 0x100 cycles - the sensor stays in linear mode.
Sofia's supported-sensor list confirms `SC2315_WDR` is a separate
entry from `SC2315E`, so no WDR firmware exists for that combo.
The segmenter and generator are validated end-to-end on the
synthetic fixture; live validation with a multi-mode sensor is
deferred to whoever has hardware where Sofia drives a sensor with
a `_WDR` variant.

Docs: new "Capturing mode switches" section in
sensor-driver-extraction.md walks through Majestic vs Sofia, the
python-dvr API, the empirical SC2315E finding, and the heuristic's
known blind spot (group-hold based hot swaps that don't toggle 0x100).

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
widgetii added a commit that referenced this pull request May 3, 2026
Closes the sixth gap from #145's plan-vs-shipped audit. The plan called
for triangulating the trace against widgetii/sc_sc2315e (the older
reverse-engineered port from the SC2235 SDK template, predating
widgetii/smart_sc2315e). Two small relaxations in trace_diff.py let
that diff run cleanly:

* extract_function_body() previously hard-required `void <fn>` to
  open. The RE port's init returns int (`int sc2235_init(VI_PIPE)`),
  so the scope flag silently matched zero writes. Relaxed the regex
  to accept any whitespace-delimited rettype/qualifier sequence.

* RE_ANY_WRITE expected the register-write call name to end at
  `write_register(`. The RE port uses `sensor_write_register_0(...)`
  (bus-numbered suffix). Allowed an optional `\w*` after `register`
  before the open paren.

Result: pair-wise diff between trace, smart_sc2315e, and sc_sc2315e
all show 100/100/100% (172 writes, 169 unique regs, identical values,
identical order). Three independent artifacts converge on the same
canonical SC2315E init - vendor binary, OpenIPC port from SC2231
template, RE port from SC2235 template. No drift, no missing regs.

test_pipeline.sh extended with a synthetic ref using the older
RE-port shape (int-returning function, _0-suffix call) so the relaxed
regex never regresses. Self-diff (current shape) and cross-style
(older shape) both asserted at 100% address match.

Docs: new "Triangulating against multiple references" section walks
through the four-way comparison procedure and the SC2315E result.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant